OntoSeek: Using Large Linguistic Ontologies for Accessing On-Line Yellow Pages and Product Catalogs

نویسنده

  • Nicola Guarino
چکیده

To exploit effectively the mass of information available today on the Web, the key problem is that of content matching: the relevant information must be selected according to the user needs, independently of the vocabulary and the syntax used to express it. Content matching seems to be an intrinsic problem for textual documents or web pages: current information retrieval techniques either rely on an encoding process that describes a given item according to a certain perspective or classification scheme, or perform a full-text analysis based on the search for userspecified words. Neither case guarantees content matching, because an encoded description may reflect only part of the content, and the mere occurrence of a word (or even sentence) does not necessarily reflect the document’s content. For general documents, there doesn’t yet seem to be a much better option than some sort of lazy full-text analysis, leaving us to sift through endless result pages. There is however a relevant class of information repositories-online yellow pages and product catalogs-where content matching can be both feasible and crucial. In this paper~, we first analyze the peculiarities of these repositories with respect togeneric Web documents, and then we discuss the role that current linguistic ontologies like WordNet (Miller, 1995) can play to support content matching. We then present the architecture of a system called OntoSeek, specifically targeted to on-line yellow pages and product catalogs. The system is the result of a two-year cooperation between CORINTO (national research consortium for object technology, a partnership of IBM Semea, Apple Italia, and Selfin Spa) and LADSEBCNR, as part of a project on retrieval and reuse of objectoriented software components (Borgo et al., 1997). OntoSeek adopts a language of limited expressiveness for content representation, and exploits a large linguistic ontology based on WordNet (namely SENSUS, developed at ISIUSC) for content matching. In general, with respect to standard word-matching systems, expressing the content structure by means of a simple representation language increases the precision of the retrieval, while adopting a hierarchy of keywords increases both recall and precision. In OntoSeek, the use of a linguistic ontology results in two further advantages: a decoupling between the user vocabulary and the encoding terminology, and an additional increase of recall and precision due to synonymy handling and sense disambiguation. Our conclusion is that yellow pages and product catalogs constitute a strategic niche, where retrieval techniques based on simple representation capabilities and large linguistic ontologies appear to be particularly effective.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

OntoSeek: Content-Based Access to the Web

PERHAPS YOU’RE AMONG THE many who’ve entered a search into a Web browser and received pages of links—only some relevant, many not? Dodging this pitfall—barring the way to the Web’s wealth of information—requires successful content matching. Current information-retrieval techniques either rely on an encoding process—using a certain perspective or classification scheme— to describe a given item, ...

متن کامل

ProdLight: A Lightweight Ontology for Product Description Based on Datatype Properties

Web pages representing offerings of products and services are a major source of data for Semantic Web-based e-commerce. This data could be useful for numerous applications, e.g. (1) more precise product search engines and shopping bots, (2) aggregation or enrichment of multi-vendor catalogs using public product descriptions, or (3) the automated discovery of additional alternatives based on the...

متن کامل

Yellow Pages on the Semantic Web

Yellow pages catalogs and corresponding directory services on the web are a widely used business concept for helping people to find companies providing services and selling products. When on the web, matching the customer’s need with the relevant services offerred by companies is typically based on keyword search, table-based search, a list of service categories listed in some order, a hierarch...

متن کامل

DTL's DataSpot: Database Exploration Using Plain Language

DTL’s DataSpot is a database publishing tool that enables non-technical end users to explore a database using free-form plain language queries combined with hypertext navigation. DataSpot is based on a novel representation of data in the form of a schema-less semi-structured graph called a hyperbase. The DataSpot Publisher takes one or more possibly heterogeneous databases, predefined knowledge...

متن کامل

A Multilingual Natural Language Interface for E-Commerce Applications

In this paper we present a multilingual natural language interface architecture, which can be used for accessing on line product catalogs and lets users formulate their queries in their native languages. In our interface architecture a rule based machinelearning module replaces an elaborate semantic analysis component. The learning module learns the correct mappings of a user’s input to the cor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003